Skip to content

Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题#7244

Closed
Li-shi-ling wants to merge 10 commits intoAstrBotDevs:masterfrom
Li-shi-ling:fix/iss-7218
Closed

Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题#7244
Li-shi-ling wants to merge 10 commits intoAstrBotDevs:masterfrom
Li-shi-ling:fix/iss-7218

Conversation

@Li-shi-ling
Copy link
Copy Markdown
Contributor

@Li-shi-ling Li-shi-ling commented Mar 31, 2026

fixes #7218
修理了数据库选择的提供商消失,导致的数据库不显示问题

Modifications / 改动点

在AstrBot\astrbot\core\knowledge_base\kb_helper.py的文件给get_ep和get_rp添加了临时占位的提供商,该提供商实现的方法被调用将抛出报错,通过这样的方法将报错延迟到实际调用时,防止数据库在初始化时报错导致的数据库不显示。

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

这是一个两个提供商都被删除的数据库,正常显示了

42e5cf1fee440dc8f33d0b02a6acc8cb

使用占位提供商的后台日志

00af5e1bfb63c70b652af9d2899259f0

测试截图

image

Checklist / 检查清单

  • [ X ] 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

没有新功能

  • [ X ] 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

有进行测试

  • [ X ] 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

没有引入

  • [ X ] 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

没有恶意代码

Summary by Sourcery

Bug Fixes:

  • Prevent knowledge bases from failing to display when their configured embedding or rerank providers are missing by logging and substituting temporary placeholder providers instead of raising initialization-time errors.

@auto-assign auto-assign bot requested review from Raven95676 and anka-afk March 31, 2026 16:32
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In get_ep / _ensure_vec_db, when embedding_provider_id is missing you now only log an error but still call get_provider_by_id with a falsy ID and continue; consider either returning a placeholder immediately or keeping the early raise to avoid unexpected behavior downstream.
  • The temporary provider classes (TempEmbeddingProvider, TempRerankProvider) are defined inside the methods and re-created on each call; consider moving them to module scope (or a shared helper) to avoid duplication and make their behavior easier to maintain.
  • In TempRerankProvider.rerank the error message mentions Embedding Provider instead of Rerank Provider, which is likely a copy-paste mistake and may confuse debugging when the placeholder is triggered.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `get_ep` / `_ensure_vec_db`, when `embedding_provider_id` is missing you now only log an error but still call `get_provider_by_id` with a falsy ID and continue; consider either returning a placeholder immediately or keeping the early `raise` to avoid unexpected behavior downstream.
- The temporary provider classes (`TempEmbeddingProvider`, `TempRerankProvider`) are defined inside the methods and re-created on each call; consider moving them to module scope (or a shared helper) to avoid duplication and make their behavior easier to maintain.
- In `TempRerankProvider.rerank` the error message mentions `Embedding Provider` instead of `Rerank Provider`, which is likely a copy-paste mistake and may confuse debugging when the placeholder is triggered.

## Individual Comments

### Comment 1
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="137-138" />
<code_context>

     async def get_ep(self) -> EmbeddingProvider:
         if not self.kb.embedding_provider_id:
-            raise ValueError(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
+            logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
         ep: EmbeddingProvider = await self.prov_mgr.get_provider_by_id(
             self.kb.embedding_provider_id,
</code_context>
<issue_to_address>
**issue (bug_risk):** After logging missing embedding_provider_id, the code still proceeds and may pass an invalid ID into get_provider_by_id.

Previously this branch raised and prevented use of an unset `embedding_provider_id`. Now it logs and then calls `get_provider_by_id(self.kb.embedding_provider_id)`, which will be `None` here, risking undefined or inconsistent behavior depending on `get_provider_by_id`’s implementation. Please either short‑circuit when the ID is missing (e.g., return a placeholder provider) or restore an early failure (raise or return a clearly invalid provider) so downstream code never relies on a `None`/invalid ID.
</issue_to_address>

### Comment 2
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="186-194" />
<code_context>
+                    super().__init__(provider_config, provider_settings)
+                    self.rerank_provider_id = rerank_provider_id
+
+                async def rerank(
+                    self,
+                    query: str,
+                    documents: list[str],
+                    top_n: int | None = None,
+                ):
+                    raise ValueError(
+                        f"无法找到 ID 为 {self.rerank_provider_id} 的 Embedding Provider"
+                    )
+
</code_context>
<issue_to_address>
**issue (typo):** TempRerankProvider error message refers to an Embedding Provider instead of a Rerank Provider.

The `TempRerankProvider.rerank` error message currently says `Embedding Provider` instead of `Rerank Provider`, which can be confusing when both provider types exist. Please update the message (and its localized text) to reference the rerank provider to match the actual failing dependency.

```suggestion
            class TempRerankProvider(RerankProvider):
                def __init__(
                    self,
                    provider_config: dict,
                    provider_settings: dict,
                    rerank_provider_id: str,
                ) -> None:
                    super().__init__(provider_config, provider_settings)
                    self.rerank_provider_id = rerank_provider_id

                async def rerank(
                    self,
                    query: str,
                    documents: list[str],
                    top_n: int | None = None,
                ):
                    raise ValueError(
                        f"无法找到 ID 为 {self.rerank_provider_id} 的 Rerank Provider"
                    )
```
</issue_to_address>

### Comment 3
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="167-168" />
<code_context>
+                        f"无法找到 ID 为 {self.embedding_provider_id} 的 Embedding Provider"
+                    )
+
+                def get_dim(self) -> int:
+                    return 512
+
+            ep: EmbeddingProvider = TempEmbeddingProvider(
</code_context>
<issue_to_address>
**issue (bug_risk):** Hardcoded embedding dimension 512 in TempEmbeddingProvider may not match the actual FAISS index or configuration.

`TempEmbeddingProvider.get_dim` always returns 512. If the existing FAISS index or config uses a different dimension, this will lead to runtime errors or index inconsistencies when `_ensure_vec_db` accesses the vector DB. Consider deriving the dimension from the knowledge base config (when available) or raising an explicit error instead of returning a hardcoded value.
</issue_to_address>

### Comment 4
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="136" />
<code_context>

     async def get_ep(self) -> EmbeddingProvider:
         if not self.kb.embedding_provider_id:
-            raise ValueError(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the placeholder provider classes and error-handling into shared, top-level helpers so the provider methods stay short, consistent, and easier to maintain.

You can keep the new “placeholder provider + logging” behavior while reducing complexity and duplication by:

* Moving the temp providers out of the methods.
* Sharing construction logic.
* Fixing the error message typo.
* Keeping `_ensure_vec_db` simple.

For example:

```python
# module-level helpers

class TempEmbeddingProvider(EmbeddingProvider):
    def __init__(self, provider_id: str) -> None:
        super().__init__({}, {})
        self._provider_id = provider_id

    async def get_embedding(self, text: str) -> list[float]:
        raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Embedding Provider")

    async def get_embeddings(self, texts: list[str]) -> list[list[float]]:
        raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Embedding Provider")

    def get_dim(self) -> int:
        return 512


class TempRerankProvider(RerankProvider):
    def __init__(self, provider_id: str) -> None:
        super().__init__({}, {})
        self._provider_id = provider_id

    async def rerank(
        self,
        query: str,
        documents: list[str],
        top_n: int | None = None,
    ):
        raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Rerank Provider")
```

Then the methods become straightforward, with no inner classes and consistent behavior:

```python
async def get_ep(self) -> EmbeddingProvider:
    if not self.kb.embedding_provider_id:
        logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
        return TempEmbeddingProvider("<missing>")

    ep: EmbeddingProvider | None = await self.prov_mgr.get_provider_by_id(
        self.kb.embedding_provider_id,  # type: ignore[arg-type]
    )
    if not ep:
        logger.error(
            f"无法找到 ID 为 {self.kb.embedding_provider_id} 的 Embedding Provider, 使用占位 Embedding Provider"
        )
        return TempEmbeddingProvider(self.kb.embedding_provider_id)

    return ep
```

```python
async def get_rp(self) -> RerankProvider | None:
    if not self.kb.rerank_provider_id:
        return None

    rp: RerankProvider | None = await self.prov_mgr.get_provider_by_id(
        self.kb.rerank_provider_id,  # type: ignore[arg-type]
    )
    if not rp:
        logger.error(
            f"无法找到 ID 为 {self.kb.rerank_provider_id} 的 Rerank Provider, 使用占位 Rerank Provider"
        )
        return TempRerankProvider(self.kb.rerank_provider_id)

    return rp
```

`_ensure_vec_db` can stay focused on constructing the vector DB, delegating all error semantics to `get_ep` / `get_rp`:

```python
async def _ensure_vec_db(self) -> FaissVecDB:
    if not self.kb.embedding_provider_id:
        logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")

    ep = await self.get_ep()
    rp = await self.get_rp()

    vec_db = FaissVecDB(
        doc_store_path=str(self.kb_dir / "doc.db"),
        index_store_path=str(self.kb_dir / "index.faiss"),
        embedding_provider=ep,
        rerank_provider=rp,
    )
    ...
```

This keeps the new behavior (logging + placeholder providers + delayed error), but:

* Removes inner classes from methods.
* Centralizes and deduplicates the temp-provider logic.
* Fixes the “Embedding Provider”/“Rerank Provider” message inconsistency.
* Makes `get_ep` / `get_rp` easier to read and maintain.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/knowledge_base/kb_helper.py
Comment thread astrbot/core/knowledge_base/kb_helper.py
Comment thread astrbot/core/knowledge_base/kb_helper.py
Comment thread astrbot/core/knowledge_base/kb_helper.py
@dosubot dosubot bot added area:core The bug / feature is about astrbot's core, backend feature:knowledge-base The bug / feature is about knowledge base labels Mar 31, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76259ed2be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread astrbot/core/knowledge_base/kb_helper.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the knowledge base helper to use placeholder providers instead of raising immediate exceptions when an embedding or rerank provider is missing. Feedback includes addressing a hardcoded vector dimension of 512 that may cause mismatches, moving the placeholder class definitions to the top level for better efficiency and readability, correcting a copy-paste error in an exception message, and removing redundant error logging in the vector database initialization logic.

Comment thread astrbot/core/knowledge_base/kb_helper.py
Comment thread astrbot/core/knowledge_base/kb_helper.py
Comment thread astrbot/core/knowledge_base/kb_helper.py Outdated
Comment thread astrbot/core/knowledge_base/kb_helper.py
@Li-shi-ling Li-shi-ling closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend feature:knowledge-base The bug / feature is about knowledge base size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]知识库加载和显示问题

1 participant